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Abstract. In the algebraic theory of codes and formal languages, the set 
Q of all primitive words over some alphabet £ has received special inter- 
est. With this survey article we give an overview about relevant research 
to this topic during the last twenty years including own investigations and 
some new results. In Section 1 after recalling the most important notions 
from formal language theory we illustrate the connection between coding 
theory and primitive words by some facts. We define primitive words as 
words having only a trivial representation as the power of another word. 
Nonprimitive words (without the empty word) are exactly the periodic 
words. Every nonempty word is a power of an uniquely determined prim- 
itive word which is called the root of the former one. The set of all roots 
of nonempty words of a language is called the root of the language. The 
primitive words have interesting combinatorial properties which we con- 
sider in Section 2. In Section 3 we investigate the relationship between 
the set Q of all primitive words over some fixed alphabet and the lan- 
guage classes of the Chomsky Hierarchy and the contextual languages 
over the same alphabet. The computational complexity of the set Q and 
of the roots of languages are considered in Section 4. The set of all pow- 
ers of the same degree of all words from a language is the power of this 
language. We examine the powers of languages for different sets of ex- 
ponents, and especially their regularity and context-freeness, in Section 
5, and the decidability of appropriate questions in Section 6. Section 7 
is dedicated to several generalizations of the notions of periodicity and 
primitivity of words. 
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1 Preliminaries 

1.1 Words and languages 

First, we repeat the most important notions which we will use in our paper. 

L should be a fixed alphabet, which means, it is a finite and nonempty 
set of symbols. Mostly, we assume that it is a nontrivial alphabet, which 
means that it has at least two symbols which we will denote by a and b, 
a 7^ b. IN = {0, 1 , 2, 3, . . .} denotes the set of all natural numbers. L* is the free 
monoid generated by L or the set of all words over L. The number of letters 
of a word p, with their multiplicities, is the length of the word p, denoted by 
|p[. If Ipl = n and n = 0, then p is the empty word, denoted by e (in other 
papers also by e or A). The set of words of length n over L is denoted by L^. 
Then L* = [j and = {e}. For the set of nonempty words over L we 

n6]N 

will use the notation Z+ = L* \ {e}. 

The concatenation of two words p = X]X2 ■ ■ -x^a and q = yiyi ■ ■ 'Vn, 
Xi,yj E I, is the word pq = xiX2 • • • Xraijiyi • • • yn- We have |pq| = |p| + |q|. 
The powers of a word p € are defined inductively: p'^ = e, and p"' = p"^^^p 
for n > 1 . p* denotes the set {p^ : n G IN}, and p+ = p* \ {e}. 

For p G Z* and 1 < i < |p[, p[i] is the letter at the i-th position of p. 
Then p =p[1]p[2]---p[|p|]. 

For words p, q G Z*, p is a prefix of q, in symbols p q, if there exists 
r G Z* such that q = pr. p is a strict prefix of q, in symbols p IZ q, if 
p C q and p / q. Pr(q) =Df {p : p IZ q} is the set of all strict prefixes of 
q (including e if q 7^ e). 

p is a suffix of q, if there exists r G Z* such that q = rp. 

For an arbitrary set M, |M| denotes the cardinality of M, and P(M) denotes 
the set of all subsets of M. 

A language over Z or a formal language over Z is a subset L of Z*. 
{L : L C Z*} = ^(Z*) is the set of all languages over Z. If L is a nonempty 
strict subset of Z*, L C Z*, then we call it a nontrivial language. 

For languages Li, L2, and L we define: 
Li • L2 = L1L2 =Df {pq : p G Li A q G L2}, 
1° =Df {e}, and L'^ =Df L^"^ • L for n > 1 . 

If one of Li , L2 is a one-element set {p}, then, usually, in Li L2 we write p instead 
of{p}. 

Languages can be classified in several ways, for instance according to the 
Chomsky hierarchy, which we will assume the reader to be familiar with (other- 
wise, see, for instance, in [8, 9, 23]). These are the classes of regular, context- 
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free, context-sensitive, and enumerable languages, respectively. Later on we 
will also consider linear languages and contextual languages and define them 
in Section 3. 

1.2 Periodic words, primitive words, and codes 

Two of the fundamental problems of the investigations of words and languages 
are the questions how a word can be decomposed and whether words are 
powers of a common word. These occur for instance in coding theory and 
in the longest repeating segment problem which is one of the most important 
problems of sequence comparing in molecular biology. The study of primitivity 
of sequences is often the first step towards the understanding of sequences. 

We will give two definitions of periodic words and primitive words, respec- 
tively, and show some connections to coding theory. 

Definition 1 A word u ^ is said to be periodic if there exists a word 
V G 1* and a natural number n > 2 such that u = v"-. If u G 1+ is not 
periodic, then it is called a primitive word over L. 

Obviously, this definition is equivalent to the following. 

Definition 1' A word u G 1+ is said to be primitive if it is not a power of 
another word, that is, u = v"- with v G L* implies n = 1 and v = u. // u G 1+ 
is not primitive, then it is called a periodic word over L. 

Definition 2 The set of all periodic words over L is denoted by Per(Z), the 
set of all primitive words over L is denoted by Q(l). 

Obviously, Q(I) = 1+ \ Per(I). 

In the sequel, if L is understood, and for simplicity, instead of Per(I) and 
Q(I) we will write Per and Q, respectively. 
Now we cite some fundamental definitions from coding theory. 

Definition 3 A nonempty set C C. L* is called a code if every equation 
U1U2 • • -Ura = V1V2 • • • Vn With itijVj G C for all i and j implies n = m and 
Ui =Vi for all i. 

A nonempty set C C. L* is called an n-code for n G JN, if every nonempty 
subset of C with at most n elements is a code. A nonempty set C C 1+ is 
called an intercede if there is some m > 1 such that C"^"*"^ fl l+C"^!"*" = 0. 

Connections to primitive words are stated by the following theorems. 
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Theorem 4 If C C and for all p, q G C with p 7^ q holds that pq G Q, 
then C is a 2-code. 

The proof will be given in Section 2. 

Theorem 5 IfCis an intercede, then C C Q . 

Proof. Assume that C ^ Q is an intercede and C^+^ n I+C"'I+ = for some 
m > 1 . Then we have a periodic word u in C which means u = v"^ G C for some 
V e 1+ and n > 2. Then u"^+l = v'^'^^+i^ = v(v^"^)v^-i G C"^+i n I+C"^I+, 
which is a contradiction. □ 

1.3 Roots of words and languages 

Every nonempty word p G L"*" is either the power of a shorter word q (if it is 
periodic) or it is not a power of another word (if it is primitive). The shortest 
word q with this property (in the first case) resp. p itself (in the second case) 
is called the root of p. 

Definition 6 The root of a word p G is the unique primitive word q 
such that p = q"- for some also unique natural number n. It is denoted by 
or root (p). The number n in this equation is called the degree of p, denoted 
by deg(p). For a language L, y/L =Df {^/p : p G L Ap 7^ e} is the root of L, 
deg(L) =Df {deg(p) : -p e L Ap ^ e} is the degree of L. 

Remark. The uniqueness of root and degree is obvious, a formal proof will 
be given in Section 2. 

Corollary 7 p = y^'^'^^'^'' for each wordip ^ e; \/L C Q for each language 
L; = Q; \/t = L «/ and only i/ L C Q. 

2 Primitivity and combinatorics on words 

Combinatorics on words is a fundamental part of the theory of words and lan- 
guages. It is profoundly connected to numerous different fields of mathematics 
and its applications and it emphasizes the algorithmic nature of problems 
on words. Its objects are elements from a finitely generated free monoid and 
therefore combinatorics on words is a part of noncommutativc discrete math- 
ematics. For its comprehensive results and its influence to coding theory and 
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primitive words we refer to the textbooks of Yu [28], Shyr [24], Lothaire [19], 
and to Chapter 6 in [23]. Here we summarize some results from this theory 
which are important for studying primitive words or which will be used later. 
The following theorem was first proved for elements of a free monoid. 

Theorem 8 (Lyndon and Schiitzenberger [20]). // pq = qp for nonempty 
words p and q, then p and q are powers of a common word and therefore pq 
is not primitive. 

Proof. We prove the theorem by induction on the length of pq, which is at 
least 2. For |pq| = 2 and pq = qp, p, q / e, we must have p = q = a for some 
a G L, and the conclusion is true. Now suppose the theorem is true for all pq 
with Ipql < n for a fixed n > 2. Let jpqj = n + 1, pq = qp, p,q / e, and, 
without loss of generality, jpj < jqj. We have a situation as in Figure 1. There 
must exist x G 1* such that q = px = xp. 

Case 1) X = e. Then p = q, and the conclusion is true. 

Case 2) X 7^ e. Since jpxj < n, by induction hypothesis p and x are powers 
of a common word. Then also q is a power of this common word. 

The theorem follows from induction. □ 

Corollary 9 w ^ Q if and only if there exist p, q € such that 
w = pq = qp. 

Theorem 10 (Shyr and Thierrin [25]) For words p, q G the two-element 
set {p, q} is a code if and only i/pq ^ qp. 

Proof. First note, that both statements in the theorem imply, that p, q ^ e 
and p 7^ q. It is trivial that for a code {p, q}, pq 7^ qp must hold. Now we 
show, that no set {p, q} with pq 7^ qp can exist which is not a code. Assume 
the opposite. Then 
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M =Df {{p, q} : p, q G I* A pq / qp A {p, q} is not a code} / 0. 
Let {p, q} G M where |pq| is minimal, and let w be a word with minimal 
length having two different representations over {p, q}. Then |w| > 2 and one 
of the following must be true: 

either (a) w = pup = qu'q or (b) w = pvq = qv'p for some u,u',v,v' G 
{p, q}*. Because ofp/q,plZqorq Cp must follow. Let us assume that 
p IZ q. For the case q C p the proof can be carried out symmetrically. Then 
from both (a) and (b) it follows that q = pr = sp for some r, s G Z+. We 
have |r| = |s| / |p| (because otherwise r = s = p and q = pp), |pr| < |pq|, 
and pr / rp (because otherwise r = s and pq = psp = prp = qp). With 
q = pr follows either (a') pup = pru'pr from (a), or (b') pvpr = prv'p from 
(b). Because of |pr| < |pq|, the choice of {p, q} having minimal length, and the 
definition of A^, it must follow that {p,r} is a code. But then from both (a') 
and (b') follows p = r, which is a contradiction. Hence A4 must be empty. □ 

From the last two theorems we get the following corollary which for its part 
proves Theorem 4. 

Corollary 11 //pq G Q for words p, q 7^ e, then {p, q} is a code. 

Note, that the reversal of this corollary is not true. For example, {aba, b} is 
a code, but abab ^ Q. 

A weaker variant of the next theorem has been proved also by Lyndon 
and Schiitzenberger [20] for elements of a free monoid. Our proof follows that 
presented by Lothaire [19]. 

Theorem 12 (Fine and Wilf [7]) Let p and q be nonempty words, |p| = n, 
Iql = m, and d = gcd(rL, m) be the greatest common divisor ofn and m. Ifp^ 
and q' for some i, j G IN have a common prefix u of length n + ra — d, then p 
and q are powers of a common word of length d and therefore -^/p = -^/q . 

Proof. Assume that the premises of the theorem are fulfilled and, without 
loss of generality, 1 < n < ra — 1 (otherwise n = m = d and p = q = u) . We 
first assume d = 1 and show, that p and q are powers of a common letter. 
Because of u C p^ and |u| = m — 1 + n we have 

(1) u[x] = u[x + n] for 1 < X < m — L 
Because of u C q' we have 

(2) u[y] = u[ij + m] for 1 < y < n — L 
Because of (1) and 1 <m — n<m— 1 we have 

(3) u[ra] = u[m — n] . 
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Let now l<x<"y<m— 1 with y — x = n mod m. Then we have two 
cases. 

Case a).y=x + rL<m— 1, and therefore u[x] = u[y] by (1). 

Case b) . y = X + n — m. Since x < m — 1 we have x + n — m<n— 1 and 
u[y] = u[x + n — m] = u[x + n] = u[x] by (2) and (1). 

Hence u[x] = u[y] whenever l<x<'y<m— 1 and y — x = n mod m. It 
foUows by (1) that u[x] = u[y] whenever 1 <x<y<m— 1 and 
y — X = k • n mod m for some k G E^. Because of gcd(n, m) = 1 , the latter 
is true if y — x is any value of {1 , 2, . . . , m — 1 }. This means, under inclusion of 
(3), u[l] = u[2] = • • • = u[m], and p and q are powers of the letter u[l]. 

If d > 1 , we argue in exactly the same way assuming L'^ instead of L as the 
alphabet. □ 

If we assume, p^ = q' for primitive words p and q and i, j S ]N\{0}, then by 
Theorem 12, p and q are powers of a common word which can only be p = q 
itself because of its primitivity. This means the uniqueness of the root of a 
word which also implies the uniqueness of its degree. 

Using Theorem 12 we can easily prove the next theorem. 

Theorem 13 (Borwein) // w ^ Q and wa ^ Q, where w G 1+ and a G Z, 
then w G a+ . 

The next theorem belongs to the most frequently referred properties con- 
cerning primitive words. 

Theorem 14 (Shyr and Thierrin [26]) If ii]U2 ^ e and uiU2 = p'' for some 
p G Q, then U2U1 = q^ for some q G Q. This means, if vl = U1U2 7^ e and 
u' = U2U1, then deg(u) = deg(u'), Iv^l = IV^L o,nd therefore u primitive if 
and only ifu' primitive. 

Proof. Let U1U2 = p^ 7^ e and p G Q. We consider two cases. 

Case 1). i = 1, which means, U1U2 is primitive. Assume that U2U1 is not 
primitive and therefore U2U1 = q' for some q G Q and j > 2. Then q = qi q2 7^ 
e such that U2 = (qiq2)"^qi, "U-i = ^2^^^ ^i)^ , and j = n + m+ 1. It follows 
that U1U2 = (q2qi ]"^^"^^^ = (qiqi)' is not primitive. By this contradiction, 
U2U1 = qMs primitive. 

Case 2). i > 2. Then p = pip2 / e such that ui = (pip2)"'Pi, '^2 = 
PllpiPl)™^) and i = n + m + 1. Since p = pip2 is primitive, by Case 1 also 
q =Df P2P1 is primitive, and U2U1 = (p2Pi )^+"-+^ = q^. □ 

The proof of the following theorem, which was first done by Lyndon and 
Schiitzenberger [20] for a free group, is rather difficult and therefore omitted 
here. 
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Theorem 15 If u^v^ = ^ e for words u,v, w E Z* and natural numbers 

m, n, k > 2, then u,v and w are powers of a common word. 

We say, that the equation = w^, where m, n, k > 2 has only trivial 

solutions. 

The next two theorems are consequences of Theorem 15. 

Theorem 16 //p, q G Q with p 7^ q, then p'-q' G Q /or all i, j > 2. 

This theorem is not true if i = 1 or j = 1 . For instance, let p = aba, q = baab, 
i = 2, j = 1. 

Theorem 17 //p, q G Q with p 7^ q and 1 > 1, then there are at most two 
periodic words in each of the languages p^q* and p*q^. 

Proof. Assume that there are periodic words in p'^q*, and p^q' should be the 
smallest of them. Then p'-q' = for some r G Q, k > 2, r 7^ q. Let also 
piqi ^ giu g pg^^ s G Q, I > j, m > 2. Then = r^q^^, and I — j = 1 
by Theorem 15. Therefore at most two words p^q' and p^q'^^ in p^q* can be 
periodic. For p*q^ the proof is done analogously. □ 

With essentially more effort, the following can be shown. 

Theorem 18 (Shyr and Yu [27, 28]) //p, q G Q with p / q, then there is at 
most one periodic word in the language p^q+. 

3 Primitivity and language classes 

As soon as the set Q of primitive words (over a fixed alphabet L) was defined, 
the question arose which is the exact relationship between Q and several known 
language classes. Here it is important that Z is a nontrivial alphabet because 
in the other case all results become trivial or meaningless: If Z = {a} then 
Q(I) = I = {a} and Per (I) = {a^ : n > 2}. 

First we will examine the relationship of Q to the classes of the Chomsky 
hierarchy, and second that to the Marcus contextual languages. 

3.1 Chomsky hierarchy 

Let us denote by REG, CF and CS the class of all regular languages, the class 
of all context-free languages and the class of all context-sensitive languages 
(all over the nontrivial alphabet Z), respectively. It is known from Chomsky 
Hierarchy that REG C CF C CS (see, e.g., the textbooks [8, 9, 23]). It is easy 
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to show that Q G CS \ REG, and hence it remains the question whether Q is 
context-free. Before stating the theorem let us remember that CF is the class of 
languages which are acceptable by nondeterministic pushdown automata, and 
CS is the class of languages which are acceptable by nondeterministic linear 
bounded automata. The latter are Turing machines where the used space on 
its tapes (this is the number of tape cells touched by the head) is bounded by a 
constant multiple of the length of the input string. If the accepting automaton 
is a deterministic one the corresponding language is called a deterministic 
context-free or a deterministic context-sensitive language, respectively. It can 
be shown that the deterministic context-free languages are a strict subclass of 
the context-free languages, whereas it is not yet known whether this inclusion 
is also strict in the case of context-sensitive languages (This is the famous 
LBA-problem) . 

Theorem 19 Q is deterministic context-sensitive but not regular. 

Proof. 1. It is easy to see that by a deterministic Turing machine for a given 
word u can be checked whether it fulfills Definition 1 and thus whether it is 
not primitive or primitive, and this can be done in space which is a constant 
multiple of |u|. 

2. is a corollary from the next theorem. 

Theorem 20 A language containing only a bounded number of primitive words 
and having an infinite root cannot be regular. 

If Q would be regular, then also Q = PerU{e} would be regular because the 
class of regular languages is closed under complementation. But \/Q = Q is 
infinite and therefore by Theorem 20 it cannot be regular. □ 

Proof of Theorem 20. Let L be a language with an infinite root and a 
bounded number of primitive words. Further let 

m =Df max({|p| : p G Ln Q}U{0}). Assume that L is regular. By the pumping 
lemma for regular languages, there exists a natural number n > 1 , such that 
any word u G L with |u| > n has the form u = xyz such that |xy| < n, y 7^ e, 
and xy^z G L for all k G IN". Let now u G L with \^/u\ > n and |u| > m. Then 
u = xyz such that 1 < |y| < |xy| < n, z 7^ e, and xy^z G L for all k G IN". 
By Theorem 14, for each k > 1, zxy^ is periodic (since |xy^z| > |u| > ra). Let 
p =Df "v/zx, i- =Df d.eg(zx), and q =Df y/y- It is p / q because otherwise, by 
Theorem 14, l-^ul = |-^/zxy| = \^/y\ < |y| < n contradicting the assumption 
|\/u| > n. Then we have infinitely many periodic words in p^q* contradicting 
Theorem 17. □ 
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In 1991 it was conjectured by Domosi, Horvath and Ito [4] that Q is not 
context-free. Even though up to now ah attempts to prove or disprove this 
conjecture failed, it is mostly assumed to be true. Some approximations to the 
solution of this problem will be given with the following theorems. 

Theorem 21 Q is not deterministic context-free. 

Proof. We use the fact that the class of deterministic context-free languages 
is closed under complementation and under intersection with regular sets. 
Assume that Q is deterministic context-free. Then also Q H a*b*a*b* = 
{a^b'a^b' : i, j G IN} must be deterministic context-free. But using the pumping 
lemma for context-free languages, it can be shown that the latter is not even 
context-free. □ 

In the same way (using the pumping lemma for Per n a*b*a*b*) it also 
follows that Per is not context-free. 

The next theorem has a rather difficult proof. Therefore and because we will 
not explain what unambiguity means, we omit the proof. 

Theorem 22 (Petersen [22]) Q is not an unambigous context-free language. 

Another interesting language class which is strictly between the context-free 
and the regular languages is the class LIN of all linear languages. 

Definition 23 A grammar G = [N,T, P, S] is linear if its productions are of 
the form A — > aB or A — ) Ba or A — > a, where a G T and A, B G N. A 
production of the form S — ) e can also be accepted if the start symbol S does 
not occur in the right-hand side of any production. 

A linear language is a language which can be generated by a linear gram- 
mar. LIN is the class of all linear languages. 

It can be shown that REG C LIN C CF. 

Theorem 24 (Horvath [10]) Q is not a linear language. 

The proof can be done by using a special pumping lemma for linear lan- 
guages and will be omitted here. 

Let C be the union of the classes of linear languages, unambigous context- 
free languages and deterministic context-free languages. Then C C CF and, by 
the former theorems, Q ^ £. But, whether Q G CF or not, is still unknown. 
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3.2 Contextual languages 

Though we do not know the exact position of Q in the Chomsky Hierarchy, its 
position in the system of contextual languages is clear. First, we cite the basic 
definitions from [21], see also [15], and then, after three examples we prove our 
result. 

Definition 25 A (Marcus) contextual grammar is a structure G = 
[Z, A, C, 4)] where L is an alphabet, A is a finite subset of L* ( called the set 
of axioms), C is a finite subset of L* x L* (called the set of contexts), and cj) 
is a function from L* into V{C) (called the choice function). If (j)(u) = C for 
every u E Z* then G is called a (Marcus) contextual grammar without 
choice. 

With such a grammar the following relations on L* are associated: For 
w,w' G I*, 

(1) w =^ex w' if and only if there exists [p^pi] G <^iw) such that 
w' = piwp2, 

(2) w =^in w' if and only if there exists wi,W2,W3 G I* and [pi,p2] £ ((^(wa) 
such that w = wi W2W3 and w' = W1P1W2P2W3. 

^'^d =^l^ denote the reflexive and transitive closure of these two rela- 
tions. 

Definition 26 For a contextual grammar G = [Z,A, 0,4)] (with or without 
choice), 

>Cex(G) =Df {w : 3u(u € AAu =^l^ w)} is the external contextual language 
(with or without choice) generated by G, 

and Cin{G) =Df {w : 3u(u € AAu w)} is the internal contextual 
language (with or without choice) generated by G. 

For every contextual grammar G = [Z, A, C, 4^], A C C^^[G) C Cin[G) 
holds. 

The above definitions are illustrated by the following examples. 

Example 1 Let G = [Z, A, C, 4^] be a contextual grammar where L = {a, b}, 
A = {e, ab}, C = {[e, e], [a, b]}, 4)(e) = {[e, e]}, <t>[ah) = {[a, b]} and 4)(w) = 
if w ^ A. Then £ex(G) = {e, ab, aabb} and 

>Cin(G) = {a^b^^ : n G Ev[} since ab =>ex aabb, ab =>?^ a"^b"^ for every n > 1 , 
and there does not exist any w' such that aabb =^ex w'. 

Example 2 Let G = [Z, A, C, 4^] be a contextual grammar where 
Z = {a, b}, A = {a}, C = {[e, e], [e, a], [e, b]}, 
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4)(e) = {[e,e]}, 4)(ua) = {[e,b]} for u e I* and 4)(ub) = {[e, a]} for u e I*. 
Then Ce^[G) = {a, ab, aba, abab, . . .} = a(ba)* U a(ba)*b and An(G) = 
al* \ aal*. 

Example 3 Let u = aia2a3--- be an cu-word over a nontrivial alpha- 
bet L where ai £ L for ah i > 1 . Let G = [Z, A, C, (j)] be a contextual 
grammar where A = {e,ai}, C = {[e, e]} U {[e, a] : a G I}, ^[e) = {[e,e]}, 
4)(ai ai • • • ai] = {[e,at+i]} and 4)(w) = if w is not a prefix of u. Then 
^ex(G) = {e, ai , ai a2, ai aaas, . . .} = Pr(u) is the set of all prefixes of u. 
Hence, there exist contextual grammars generating languages which are not 
recursively enumerable. 

Theorem 27 (Ito [5]) Q is an external contextual language with choice but 
not an external contextual language without choice or an internal contextual 
language with or without choice. 

Proof. 1. Let G = [Z,l,{[u,v] : uv € 1* A |uv| < 2}, cj)] be a contextual 
grammar, where 4)(w) = {[u,v] : uv G Z* A luv| < 2 A uwv G Q} for every 
w £ L* . Then obviously £ex(G) C Q. We prove Q C Cex[G) by induction. 
First we have L C (ZuZ^)nQ C Z^ex(G). Now assume that for a fixed n > 2 
all primitive words p with |p| < u are in CexiG]. Let u be a primitive word of 
smallest length > n + L We have two cases. 

Case a), u = wxiX2 with xi,X2 G L and at least one of w and wx] is in 
Q. Then, by induction hypothesis, w G Ce^iG] or wx-\ G £ex(G). But then 
w wxiX2 or wxi =^>ex WX1X2, and thus u G £ex(G). 

Case b). u = WX1X2 with xi,X2 G L and none of w and wxi is in Q. Then, 
by Theorem 13, w = xj for some i > 1, hence u = x|^^X2 with xi 7^ X2, and 

^2 ^ex ^^^2 ^ex ''I ''I ^2 ^ex ' ' ' ^ex 

x|+^X2, and therefore u G £ex(G). 

2. Assume that there exists a contextual grammar G = [Z, A, C, cj)] without 
choice such that Q = £ex(G). There must be at least one pair [u,v] G (\)[w) 
with uv 7^ e for all w G Z*. Let p = ^/vu and i = deg(vu) > 1. Because 
of p G Q = £ex(G), also upv would be in >Cex(G). We have vup = p^+^ By 
Theorem 14, deg(upv] = deg(vup) = i+ 1 > 2 and therefore upv ^ Q, which 
is a contradiction. 

3. Assume Q = £iri(G) for some contextual grammar G = [1, A, C, cj)] (with 
or without choice) . There must be words u, v, w G L* with uv 7^ e and [u, v] G 
4)(w). Let n = |uwv| and a, b G X with a 7^ b. Then a^b"-wa"^b"-uwv G Q, but 
a^b"^wa^b"-uwv a^b"^uwva^b"^uwv = (a"-b"^uwv)^ ^ Q, contradicting 
An(G) = Q. □ 



Theorem 28 Per is not a contextual language of any kind. 
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Proof. Assume Per = Cex[G) or Per = Cin.{G) for some contextual grammar 
G = [Z, A, C, cj)] (with or without choice). Let m be a fixed number with 
m > max{lp| : p G AV3u([p,u] G CV[u,p] G C)}. Because a'^b^'a^b"^ G Per 
we must have q G Per such that q =^ex a^b"^a"^b^ or q =^in a^b^a"^b"^. 
In the first case, q = a^b'^a'^b' with i < m V j < m must follow. But then 
q ^ Per. In the second case, q = a^b'a^b*^ with i < mVj < mVk < mVl < m 
must follow. But then qq =^in a^b'^a'^b^a^b'a'^b'- ^ Per whereas qq G Per. 
Therefore Per / £ex(G) and Per / £in(G). □ 

4 Primitivity and complexity 

To investigate the computational complexity of Q and that of roots of lan- 
guages on the one hand is interesting for itself, on the other hand - because 
Q = ^/L* - there was some speculation to get hints for solving the problem of 
context-freeness of Q. First, let us repeat some basic notions from complexity 
theory. 

If is a deterministic Turing machine, then is the time complexity 
of M, defined as follows. If p G L* , where L is the input alphabet of M, 
and A4 on input p reaches a final state (we also say Ai halts on p), then 
"tx(p) is the number of computation steps required by Ai to halt. If Ai does 
not halt on p, then t_A/((p) is undefined. For natural numbers n, t_A4(rL) =Df 
max{t_A/( (p) : p G 1* A Ipl = n} if A4 halts on each word of length n. If t is a 
function over the natural numbers, then TIME(t) denotes the class of all sets 
which are accepted by multitape deterministic Turing machines whose time 
complexity is bounded from above by t. Restricting to one-tape machines, the 
time complexity class is denoted by l-TIME(t). 

For simplicity, let us write TIME(n^) instead of the more exact notation 
TIME(f), where f(n] = n^. 

Theorem 29 (Horvath and Kudlek [12]) Q G l-TIME[n^). 

The proof which will be omitted is based on Corollary 9 and the linear 
speed-up of time complexity. The latter means that l-TIME(t') C l-TIME(t) 
if t' G 0(t) and t(n) > for aU n. 

The time bound is optimal for accepting Q (or Per) by one-tape Turing 
machines, which is shown by the next theorem. 

Theorem 30 ([17]) For each one-tape Turing machine Ai deciding Q, tj^^ G 

O(n^) must hold. The latter means: 

3c3uo(c > Ano G INAVntn > no ^ tx(n) > c -rr^)). 
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The proof which wih be omitted also, uses the for complexity theorists well- 
known method of counting the crossing sequences. 

Now we turn to the relationship between the complexity of a language and 
that of its root. It turns out that there is no general relation, even more, there 
can be an arbitrary large gap between the complexity of a language and that 
of its root. 

Theorem 31 ([17, 16]) Lett and t be arbitrary total functions over IN such 
that t G cu(n) is monotone nondecreasing and f is monotone nondecreasing, 
unbounded, and time constructible. Then there exists a language L such that 
Lg l-TIME[0[i)] but \/L ^ TIME[f). 

Instead of the proof which is a little bit complicated we only explain the 
notions occuring in the theorem, t S cu(n) means lim = 0. A time con- 

structible function is a function f for which there is a Turing machine halting 
in exactly f (n) steps on every input of length n for each n G IN. One can show 
that the most common functions have these properties. Finally, 
l-TIME(0(t)) = U{l-TIME(t') : 3c3no(c > A tlq e IN A Vn(n > tlq ^ 
t'(n) <c-t(n]))}. 

Let us still remark, that from Theorem 31 we can deduce that there exist 
regular languages the roots of which are not even context-sensitive, see [15, 16]. 

5 Powers of languages 

In arithmetics powers in some sense are counterparts to roots. Also for formal 
languages we can define powers, and also here we shall establish some con- 
nections to roots. For the first time, the power pow(L] of a language L was 
defined by Calbrix and Nivat in [3] in connection with the study of properties 
of period and prefix languages of cu-languages. They also raised the problem 
to characterize those regular languages whose powers are also regular, and to 
decide the problem whether a given regular language has this property. Cachat 
[2] gave a partial solution to this problem showing that for a regular language 
L over a one-letter alphabet, it is decidable whether pow(L) is regular. Also 
he suggested to consider as the set of exponents not only the whole set IN of 
natural numbers but also an arbitrary regular set of natural numbers. This 
suggestion was taken up in [13] with the next definition. 

Definition 32 For a language L C Z* and a natural number k G IN, 
L^) =Df {p^ : p E L}. For H C IN, 
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POWH (L) =Df U L""' = {p^ : p G L a k G H} is the H-power of L. 

kGH 

Instead o/powh(L) we also write L'*^^, and also it is usual to write pow(L) 
instead o/powiN(L) = L^'^'. 

Note the difference between L^*^^ and L^. For instance, if L = {a, b} then 
Lf2) = {aa, bb}, = {aa, ab, ba, bb} and 1^^) = a* U b*. 

We say that a set H of natural numbers has some language theoretical 
property if the corresponding one-symbol language {a^ : k G H} = {a}'*^' 
which is isomorphic to H has this property. 

It is easy to see that every regular power of a regular language is context- 
sensitive. More generally, we have the following theorem. 

Theorem 33 ([13]) // H C ]N is context-sensitive and L G CS then also 
POWh(L] = L'^' is context-sensitive. 

Proof. Let L C Z* be context-sensitive and also H C IN be context-sensitive. 
By the following algorithm, for a given word u G Z* we can decide whether 
uGLt"). 

1 if (u G LAI G H) V (u= e AO G H) 

2 then return "u is in L^^'" 

3 else compute p = and d = deg(u) 

4 for i ^ 1 to [f J 

5 doifp^GLAfGH 

6 then return "u is in L'""*'" 

7 return "u is not in l'*^'" 

[j\ in line 4 is ^ if d is even, and if d is odd. Each step of the algorithm 
can be done by a linear bounded automaton or by a Turing machine where 
the used space is bounded by a constant multiple of |u|. Crucial for this are 
that Ipl < |u|, d < |u|, and the decisions in line 1 and in line 5 can also be 
done by a linear bounded automaton with this boundary, because L and H are 
context-sensitive and therefore acceptable by linear bounded automata. □ 

The last theorem raises the question whether and when L'*^' is in a smaller 
class of the Chomsky hierarchy, especially if L is regular. This essentially de- 
pends on whether the root of L is finite or not. Therefore we will introduce 
the notions FR for the class of all regular languages L such that -v/L is finite, 
and IR =Df REG \ FR for the class of all regular languages L such that vT- is 
infinite. 
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Theorem 34 ([13]) The class FR of regular sets having a finite root is closed 
under the power with finite sets. 

Proof. Let L be a regular language with a finite root {pi, . . . ,pi^} and e ^ L, 
and let U =Df L n p * for each I e {1 , . . . , k}. Since U C p.* and U G REG, Lt 
is isomorphic to a regular set of natural numbers, namely = deg(LO. 
For each n G IN, • n =Df {m • n : m € Mi] is regular too. Therefore, for a 
finite set H C also |J Mi. • n is regular which is isomorphic to L- . Then 

TiGH 

L^^' = Q is regular, and VlJ^ = \/L is finite. If the empty word is in 
1=1 

the language then, because of powH(LU{e}) = powh(I-) U{e} we get the same 
result. □ 
If H is infinite then powh(L) may be nonregular and even non-context-free. 
This is true even in the case of a one-letter alphabet where the root of each 
nonempty set (except {e}) has exactly one element. This is illustrated by the 
following example. 

Let L = {a^^+^ : m G IN}. Then L G FR but 
L(]N) = {qI^ : k g in \ {2"^ : m > 0}} ^ CF. 

Therefore it remains a problem to characterize those regular sets L with finite 
roots where powh(L) is regular for any (maybe regular) set H. 

Our next theorem shows that the powers of arbitrary (not necessarily reg- 
ular) languages which have infinite roots are not regular, even more, they are 
not even context-free, if the exponent set is an arbitrary set of natural numbers 
containing at least one element which is greater than 2 and does not contain 
the number 1, or some other properties are fulfilled. 

Theorem 35 ([13]) For every language L which has an infinite root and for 
every set H C IN containing at least one number greater than 2, powh(L) is 
not context-free if one of the following conditions is true: 

(a) I^H, (c) Ln Vie REG 

(b) \/L G REG, (d) L G REG and ^/\W\l. is infinite. 

Proof. Let L C Z* be a language such that \/L is infinite, and let H C IN with 
H\{O,l,2}/0. We define 

(POWh(L) if (a) is true, 

POWh(L) \ \/L if (b) is true, 

POWh(L) \ (Ln\/L) if (c) is true, 

P0Wh(L)\L if (d) is true. 



Primitive words and roots of words 



21 



If more than one of the conditions (a], (b), (c), (d) are true simultaneously, 
then it doesn't matter which of the appropriate hues in the definition of L' we 
choose. It is important that in each case, ^/L' is infinite, there is no primitive 
word in L' and, if powH(t-) was context-free then also L' would be context- 
free. But we show that the latter is not true. 

Assume that L' is context-free, and let n > 3 be a fixed number from H. By 
the pumping lemma for context-free languages, there exists a natural number 
m such that every z G L' with |z| > m is of the form W1W2W3W4W5 where: 
W2W4 7^ e, IW2W3W4I < m, and w^WjW^w^ws G L' for all I G IN. 
Now let z G L' with deg(z) > n and \^/z\ > 2m which exists because \/l7 
is infinite. Let p =Df \/z and k =Df deg(z). Then |z| = k • |p| > 2km. By 
the pumping lemma, z = = W1W2W3W4W5 where W2W4 7^ e, IW2W3W4I < 
m < ^, and W1W2W3W4W5 G L' for each i G IM. Especially, for i = 0, x =Df 
W1W3W5 G L' and therefore x is nonprimitive. Now let z' =Df W5W1W2W3W4, 
q =Df Vz', x' =Df W5W1 W3, and s =Df Vt<^. By Theorem 14 we have deg(z') = 

Ix' 

deg(z) = k and x' nonprimitive, therefore |q| = |p| > 2m and |s| < 

It follows z' = and x' = q^^^q' for some word q' with ^ < |q'| < |q| 

(because of < IW2W4I < IW2W3W4I < ^). The words z' and x' which are 
powers of q and s, respectively, have a common prefix W5W1 of length |z| — 
IW2W3W4I > k • Iql — ^. Because of |s| < ^ < j • |q| and k > 3, we have 
|q| + |s| < (2 + l)|q| < (k— 2)!q|, and therefore q = s by Theorem 12. But then 
x' = s^^^ q' with < Iq'l < |s| which contradicts Vtc' = s. □ 

It remains open whether the H-power of a regular language is regular or 
context-free or neither, if H = IN or H C {0,1,2}. First, we consider the 
exceptions 0, 1, and 2 where we find out a different behavior. 



Theorem 36 ([13]) (i) For each L G REG and H C {0, 1}, L'"^ ^ jieG. 
(ii) For each L G FR, iS^^ G FR. 
(Hi) For each L G IR, \S^^ ^ REG. 



Proof, (i) is trivial, (ii) follows from Theorem 34. (iii) follows from Theorem 
20. □ 

A set poW{2}(I-) = L'^' we call also the square of L. Because of the former 
theorem, only the squares of regular languages with infinite roots remain for 
interest. In contrast to the former results where the power of a regular set 
either is regular again or not context-free, this is not true for the squares. It 
is illustrated by the following examples: 
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Let Li =of a-{b}* and L2 =Df {<i>b}*. Then both Li and L2 are regular with 
infinite roots, but lJ^^ G CF and 4^' ^ CF. 

To characterize those regular languages whose squares are context-free we 
introduce the following notion. 

Definition 37 Let Tp G Q and w,w' G L* such that]) is not a sujfix ofw and 
w'w ^ p+. The sets wp*w' and p*wp* are called inserted iterations 
of the primitive word p. The words p, w, w' are called the modules of 
wp*w' , and p, w are called the modules o/p*wp*. A FlP-set is a finite 
union Li U . . .ULn of inserted iterations of primitive words. The sets Li , . . . , Ln 
are also called the components of the FlP-set. 

Using this notion we can give the following reformulation and simplification 
of a theorem by Ito and Katsura from 1991 (see [14]) which has a rather 
difficult proof. 

Theorem 38 // L'^' G CF and L'^^ C Q(^) then L must be a subset of a 
FlP-set. 

Using this theorem and the proof idea from Theorem 35 we can show the 
following characterization. 

Theorem 39 ([18]) For a regular language L, L'^' is context-free if and only 
if L is a subset of a FlP-set. 

Proof. We show here only one direction. Let L be regular and L'^' G CF. 
We consider three cases. Case a). L G FR. Let \/L = {pi, . . . ,pn}- Then L C 
pf U • • • U p * and pf U • • • U p * is a FlP-set. 

Case b). L G IR and \/L n Per is infinite. This means, L has infinitely many 
periodic words with altogether infinitely many roots of unbounded lengths. 
Then L'^' contains words z with j-y/zj > 2m for arbitrary m and deg(z) > 4. 
If L'^^ would be context-free then we would get the same contradiction as in 
the proof of Theorem 35. Therefore case b) cannot occur. 

Case c). L G IR and VL n Per is finite. Let Li =Df L n Q, L2 =Df L n Q, 
and y/\~2 = {pi , . . . , pi^}. Then L = Li U L2, Li n L2 = 0, and 
L2 = ((pf U • • • U p{) \ {pi, . . . ,pk}) n L is in FR. Therefore also L^ ' G FR 
by Theorem 36, and lJ^' G CF because L^' = lJ^' U G CF. We have 

l|^' ^ Q'^'i and by Theorem 38 follows that Li is a subset of a FlP-set. L2 is 
a subset of a FlP-set by case a), and so is L = Li U L2. □ 
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Now it is easy to clarify the situation for the n-th power of a regular or 
even context-free set for an arbitrary natural number n, where it is trivial 
that L(0' ={e}, L'^' = L. 

Theorem 40 ([18]) For an arbitrary context-free language L and a natural 
number n> 2, if L^^' is context-free, then either n > 3 and L G FR or n = 2 
and L n Per e FR. 

Proof. If n > 3 and vl- is infinite then L'"^' ^ CF by Theorem 35. It is 
well-known that every context-free language over a single-letter alphabet is 
regular. Using this fact it is easy to show that every context-free language 
with finite root is regular too. Therefore, if vT is finite and L £ CF then 
L G FR, and L'^^ G FR by Theorem 34. If n = 2, L'^i G CF and is infinite, 
then L n Per G FR must be true by the proof of Theorem 39. □ 
Now we consider the full power pow(L) = pow]N(L) for a regular language 

L. 

Theorem 41 (Fazekas [6]) For a regular language L, pow(L] is regular if and 
only f/pow(L) \ L G FR. 

Proof. If pow(L) \ L G FR C REG then (pow(L) \ L) U L = pow(L) G REG 
because the class of regular languages is closed under union. For the opposite 
direction assume pow(L) G REG. Then also L' =Df pow(L) \ L is regular 
because the class of regular languages is closed under difference of two sets. 
There are no primitive words in L' and therefore, by Theorem 20, it must have 
a finite root. □ 



6 Decidability questions 

Questions about the decidability of several properties of sets or decidability 
of problems belong to the most important questions in (theoretical) computer 
science. Here we consider the decidability of properties of languages regarding 
their roots and powers. We will cite the most important theorems in chrono- 
logical order of their proofs but we omit the proofs because of their complexity. 

Theorem 42 (Horvath and Ito [11]) For a context-free language L it is de- 
cidable whether \/L is finite. 

Theorem 43 (Cachat [2]) For a regular or context-free language L over single- 
letter alphabet it is decidable whether pow(L) is regular. 
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Using Cachat's algorithm, Horvath showed (but not yet pubhshed) the fol- 
lowing. 

Theorem 44 (Horvath) For a regular or context-free language L with finite 
root it is decidahle whether pow(L) is regular. 

Remark. Since the context-free languages with finite root are exactly the 
languages in FR (Remark in the proof of Theorem 40), it doesn't matter 
whether we speak of regularity or context-freeness in the last theorems. 

Remarkable in this connection is also the only negative decidability result 
by Bordihn. 

Theorem 45 (Bordihn [1]) For a context-free language L with infinite root it 
is not decidahle whether 'pow[L) is context-free. 

The problem of Calbrix and Nivat [3] and the open question of Cachat [2] for 
languages over any finite alphabet and almost any sets of exponents, but not 
for all, was answered in [13]. Especially the regularity of pow(L) for a regular 
set L remained open, but it was conjectured that the latter is decidahle. Using 
these papers, finally Fazekas [6] could prove this conjecture. 

Theorem 46 (Fazekas [6]) For a regular language L it is decidahle whether 
pow(L) is regular. 

Finally, we look at the squares of regular and context-free languages. 

Theorem 47 ([18]) For a regular language L it is decidahle whether L'^' is 
regular or context-free or none of them. 

Proof. Let L be a regular language generated by a right-linear grammar G = 
[I, N, S, R] and let m = |N| -h 1 . By Theorem 36, L'^' is regular if and only if 
a/L is finite. The latter is decidahle by Theorem 42. If \/L is infinite then by 
Theorem 39, L'^' is context-free if and only if L is a subset of a FlP-set. If L is 
a subset of a FlP-set then we can show that there exists a FlP-set F such that 
L C F and all modules of all components of F have lengths smaller than m. 
Thus there are only finitely many words which can be modules and only finitely 
many inserted iterations of primitive words having these modules. The latter 
can be effectively computed. Let Li, . . . ,Ln be all these inserted iterations of 
primitive words. Then L'^' is context-free if and only if L C Li U • • • U Ln 
which is equivalent to L n (Li U • • • U Ln.) = 0. The latter is decidahle for 
regular languages L and Li , . . . , L^. □ 
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Figure 2: Concatenation with overlap 



7 Generalizations of periodicity and primitivity 

If u is a periodic word then we have a strict prefix v of u such that u is ex- 
hausted by concatenation of two or more copies of v, u = v"', n > 2 (see Figure 
3). But it could be that such an exhaustion is not completely possible, there 
may remain a strict prefix of v and the rest of v overhangs u, i.e. u = v"^v', 
n > 2, v' IZ V (see Figure 4). In such case we call u to be semi-periodic. A third 
possibility is to exhaust u by concatenation of two or more copies of v where 
several consecutive copies may overlap (see Figure 5). In this case we speak 
about quasi-periodic words. If a nonempty word is not periodic, semi-periodic, 
or quasi-periodic, respectively, we call it a primitive, strongly primitive, or hy- 
perprimitive word, respectively. Of course, periodic and primitive words are 
those we considered before in this paper. Finally, we can combine the possi- 
bilities to get three further types which we will summarize in the forthcoming 
Definition 49. Before doing so, we give a formal definition of concatenation 
with overlaps. All these generalizations have been introduced and detailed 
investigated in [15]. Most of the material in this section is taken from there. 

Definition 48 For p,q £ L* , we define 
p <^ q =Df {W1W2W3 : W1W3 7^ e A w-\W2 = p A W2W3 = q}, 
p®o {e}, p^i'+i =Df [j[w p : w G p®M for k G IN, 
A B =Df U{P <^ q : P e /\ q e B} for sets A, B C I*. 

The following example shows that in general, p ^ q is a set of words: 
Letp = aabaa. Thenp(8)p = p*^^ = {aabaaaabaa, aabaaabaa, aabaabaa}. 
We can illustrate this by Figure 2. 

In the following definition we repeat our Definitions 1 and 2 and give the 
generalizations suggested above. 
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Figure 3: u is periodic, u G Per, v = root(u] 



Definition 49 

Per =Df 



Q 

SPer 

SQ 
QPer 



-Df 
=Df 

=Df 
=Df 



HQ =Df 

PSPer =Df 

SSQ =Df 

SQPer =Df 

SHQ =Df 

QQPer =Df 

HHQ =Df 

The different 
3-8. 



{u : 3v3rL(v IZuAn>2Au = v"-)} is the set of 
periodic words. 

1+ \ Per is the set of primitive words. 

{u : 3v3n[v C u A n > 2 A u G v"- • Pr (v) )} is the 
set of semi-periodic words. 

\ SPer is the set of strongly primitive words. 

{u : 3v3n[v IZuAn>2AuG v®^)} is the set of 
quasi-periodic words. 

\ QPer is the set of hyperprimitive words. 

{u : 3v3n{v IZuAn>2AuG {v"-}® Pr(v))} is the 
set of pre-periodic words. 

\ PSPer is the set of super strongly primitive 

words. 

{u: 3v3n[v C uAn > 2 Au G v®^ • Pr(v))} is the 

set of semi-quasi-periodic words. 

Z+ \ SQPer is the set of strongly hyperprimitive 

words. 

{u : 3v3n[v IZuAn>2AuG v®^ ® Pr(v))} is the 

set of quasi-quasi-periodic words. 

Z+ \ QQPer is the set of hyperhyperprimitive 

words. 

kinds of generalized periodicity are illustrated in the Figures 



Theorem 50 The sets from Definition 49 have the inclusion structure as 
given in Figure 9. The lines in this figure denote strict inclusion from bottom 
to top. Sets which are not connected by such a line are incomparable under 
inclusion. 
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Figure 4: u is semi periodic, u G SPer, v = sroot(u) 
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Figure 5: u is quasi-periodic, u G QPer, v = h.root(u) 
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Figure 6: u is prc-pcriodic, u G PSPer, v = ssroot(u) 
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Figure 7: u is semi-quasi-periodic, u G SQPer, v = sh.root(u) 



u 




V 




V 




V 








V 


?' 




V 




V 1 

1 



Figure 8: u is quasi-quasi-periodic, u G QQPer, v = h,h,Toot{u) 
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Figure 9: Inclusion structure 



Proof. Because of the duality between the sets, it is enough to prove the 
left structure in Figure 9. Let u E SPer, it means, u = v"^q where n > 2 
and q IZ V. Thus v = qr for some r £ L* and u = (qr)"^q G (qrq]®"^ and 
therefore u G QPer and SPer C QPer. The remaining inclusions are clear by 
the definition. To show the strictness of the inclusions we can use the following 
examples: 

ui = abaababab, U2 = aababaababaabaab, U3 = aabaaabaaba, 
U4 = abaabab, U5 = ababa. 

Then ui G QQPer \ (SQPer U PSPer), uj G SQPer \ QPer, 
U3 G QPer \ PSPer, U4 G PSPer \ SQPer, and U5 G SPer \ Per. 
U3 and U4 also prove the incomparability. □ 
The six different kinds of periodicity resp. primitivity of words give rise to 
define six types of roots where the first one is again that from Definition 6. 

Definition 51 Let u G 

The shortest word v such that there exists a natural number n with 

u = is called the root o/u, denoted by root(u). 

The shortest word v such that there exists a natural number n with 

u ■ Pr[v) is called the strong root o/u, denoted by sroot(u). 

The shortest word v such that there exists a natural number n with 

u G v®"^ is called the hyperroot o/u, denoted by hToot(u). 

The shortest word v such that there exists a natural number n with 

u G {v"-}(g)PT(v) is called the super strong root o/u, denoted by ssroot(u). 

The shortest word v such that there exists a natural number n with 
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u G v®"^ • Pr(v) is called the strong hyperroot ofu, denoted by sh,root(u). 
The shortest word v such that there exists a natural number n with 
u G v®"^(8> Pr(v] is called the hyperhyperroot ofu, denoted by h,h.root(u). 
If L is a language, then root(L) =Df {root(p) : p G L A p 7^ e} is the root 
of L. Analogously sroot(L), liroot(L), ssroot(L), shroot(L) and hh,root(L) 
are defined. 

The six kinds of roots are illustrated in the Figures 3-8 (if v is the shortest 
prefix with the appropriate property). 

root, sroot, hroot, ssroot, shroot and hhroot are word functions over 
Z^, i.e., functions from Z+ to Generally, for word functions we define the 
following partial ordering, also denoted by 1^. 
dom(f ) for a function f denotes the domain of f . 

Definition 52 For word functions f and g having the same domain, 
f E g =Df Vu(u G dom(f) f(u) □ g(u)). 

Theorem 53 The partial ordering C for the functions from Definition 51 
is given in Figure 10. 

Proof. It follows from the definition, that for an arbitrary word u G Z+ and its 
roots we have the prefix relationship as shown in the figure. It remains to show 
the strict prefixes and incomparability. This can be done, for instance, by the 
following examples. Let ui = abaabaababaabaabab, U2 = abaabaabab, 
and U3 = abaababaabaabaab. Then 

hhroot(ui) = aba C shroot(ui) = abaab iz ssroot(ui] = sroot(ui) = 
abaabaab c hroot(ui) = abaabaabab c root(ui) = ui, 
ssroot(u2) = aba C shrootfui) = abaab IZ srootfui) = abaabaab IZ 
hroot (U2) = U2, and 

hroot(u3) = abaab Z sroot(u3) = abaababaaba, which proves our figure. 

□ 

For most words u, some of the six roots coincide, and we have the question 
how many roots of u are different, and whether there exist words u such that 
all the six roots of u are different from each other. This last question was 
raised in [15], and it was first assumed that they do not exist. But in 2010 
Georg Lohmann discovered the first of such words. 

Definition 54 Let k G {1,2,3,4,5,6}. A word u G is called a k-root 
word if 

|{root(u), sroot(u), hroot(u), ssroot(u), shroot(u), hhroot (u)}| = k. 
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root 




sroot hroot 




ssroot shroot 




hhroot 



Figure 10: Partial ordering of the root-functions 
A 6-root word is also called a Lohmann word. 

u is called a strong k-root word if it is a k-root word and root(u) 7^ u, it 

means, it is a periodic k-root word. 

The following theorems give answers to our questions. The proofs are easy 
or will be published elsewhere. 

Theorem 55 The lexicographic smallest k-root words are a for k = 1 , 
aba for k = 2, ababa for k = 3, abaabaabab for k = 4, 
abaabaababaabaabab for k = 5, and 
ababaabababaababaababababaabab for k = 6. 
The lexicographic smallest strong k-root words are aa for k = 1 , 
abaababaab fork = 2, (ab^abab^abab^)^ fork = 3, and 
(ababaabababaabab)^ fork = 4. 

Theorem 56 There exist no strong k-root words for k = 5 and k = 6. 

Theorem 57 Let v and w be words such that e IZ v IZ w, wv % p'' for some 
p C w and 1 > 1 and ki , ki, k^ be natural numbers with 2 < ki < ki < ks < 
2ki . Then u = ■w^''vw^^vw^''vw^^vw^^^^^ is a Lohmann word. 
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It is still open whether the sufficient condition in the last theorem is also a 
necessary condition for Lohmann words. 

Let us now examine whether the results from the former sections are also 
true for generalized periodicity and primitivity. First, we give generalizations 
of Corollary 9 and Theorem 13. For their proofs we refer to [15]. 

Lemma 58 w ^ SQ if and only i/w = pq = qr for some p, q,r G 1+ and 
lql>¥- 

Lemma 59 // aw ^ SQ and wb ^ SQ, where w G Z+ and a, b G then 
awb ^ SQ. 

Lemma 60 // aw ^ HQ and wb ^ HQ, where w G Z+ and a, b G then 
awb ^ HQ. 

Theorem 19 remains true for each of the sets from Definition 49. The Theo- 
rems 21, 22, and 24 with their proofs are passed to each of the languages SQ, 
HQ, SSQ, SHQ, and HHQ. Also the non-context-freeness of each of the sets 
of generalized periodic words is simple as remarked after Theorem 21. The 
context-freeness of the sets of generalized primitive words is open just as that 
of Q. 

Using Lemma 59 and Lemma 60 it can be shown that Theorem 27 is also 
true for SQ and HQ. Also none of SSQ, SHQ, HHQ, and the sets of generalized 
periodic words is a contextual language of any kind. 

Theorem 30 and its proof remain true for each of the sets from Definition 49. 
Theorem 29 is true for SQ where the proof uses Lemma 58. Whether the time 
bound is also optimal for accepting one of the remaining sets remains open. 
Theorem 31 and its proof remain true for each of the roots from Definition 51. 
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